SOMERSD
Overview
The SOMERSD function calculates Somers’ D, an asymmetric measure of ordinal association between two ranked variables. Named after Robert H. Somers, who introduced it in 1962, this statistic quantifies the degree to which two ordinal variables move together, accounting for ties in the independent variable. Somers’ D is widely used in rank statistics, logistic regression model evaluation, and credit scoring applications.
Like Kendall’s tau, Somers’ D measures the correspondence between two rankings by comparing concordant pairs (where both rankings agree on the ordering) and discordant pairs (where rankings disagree). Two pairs (x_i, y_i) and (x_j, y_j) are concordant if x_i > x_j and y_i > y_j, or if x_i < x_j and y_i < y_j. They are discordant if the inequalities point in opposite directions.
The key distinction from Kendall’s tau is how the statistic handles normalization. Somers’ D is defined in terms of Kendall’s \tau_a as:
D(Y|X) = \frac{\tau_a(X, Y)}{\tau_a(X, X)}
where \tau_a(X, X) represents the number of pairs with unequal X values. This formulation makes Somers’ D asymmetric—D(Y|X) \neq D(X|Y) in general—treating X as the independent variable and Y as the dependent variable. The statistic ranges from -1 (complete disagreement) to +1 (complete agreement).
For binary dependent variables, Somers’ D has a direct relationship to the area under the ROC curve (AUC):
\text{AUC} = \frac{D_{XY} + 1}{2}
This connection makes it particularly valuable for evaluating binary classification and predictive models.
This implementation uses SciPy’s somersd function, which accepts either two 1D arrays of rankings or a 2D contingency table. The function returns both the Somers’ D statistic and a p-value computed using an asymptotic approximation under the null hypothesis D = 0. For more background, see the Wikipedia article on Somers’ D and Somers’ original paper “A New Asymmetric Measure of Association for Ordinal Variables” (1962).
This example function is provided as-is without any representation of accuracy.
Excel Usage
=SOMERSD(x, y, somersd_alternative)
x(list[list], required): Either a 2D list of rankings (as a column vector) or a 2D contingency tabley(list[list], optional, default: null): 2D list of rankings (as a column vector), same number of rows as x. Ignored if x is a contingency tablesomersd_alternative(str, optional, default: “two-sided”): Defines the alternative hypothesis
Returns (list[list]): 2D list [[statistic, p_value]], or error message string.
Examples
Example 1: Demo case 1
Inputs:
| x | somersd_alternative | ||||
|---|---|---|---|---|---|
| 27 | 25 | 14 | 7 | 0 | two-sided |
| 7 | 14 | 18 | 35 | 12 | |
| 1 | 3 | 2 | 7 | 17 |
Excel formula:
=SOMERSD({27,25,14,7,0;7,14,18,35,12;1,3,2,7,17}, "two-sided")
Expected output:
| Result | |
|---|---|
| 0.6033 | 0 |
Example 2: Demo case 2
Inputs:
| x | y | somersd_alternative |
|---|---|---|
| 1 | 1 | two-sided |
| 2 | 2 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 5 |
Excel formula:
=SOMERSD({1;2;3;4;5}, {1;2;3;4;5}, "two-sided")
Expected output:
| Result | |
|---|---|
| 1 | 0 |
Example 3: Demo case 3
Inputs:
| x | y | somersd_alternative |
|---|---|---|
| 1 | 5 | two-sided |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 2 | |
| 5 | 1 |
Excel formula:
=SOMERSD({1;2;3;4;5}, {5;4;3;2;1}, "two-sided")
Expected output:
| Result | |
|---|---|
| -1 | 0 |
Example 4: Demo case 4
Inputs:
| x | y | somersd_alternative |
|---|---|---|
| 1 | 3 | two-sided |
| 2 | 1 | |
| 3 | 5 | |
| 4 | 2 | |
| 5 | 4 |
Excel formula:
=SOMERSD({1;2;3;4;5}, {3;1;5;2;4}, "two-sided")
Expected output:
| Result | |
|---|---|
| 0.2 | 0.3613 |
Python Code
import math
from scipy.stats import somersd as scipy_somersd
def somersd(x, y=None, somersd_alternative='two-sided'):
"""
Calculate Somers' D, an asymmetric measure of ordinal association between two variables.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.somersd.html
This example function is provided as-is without any representation of accuracy.
Args:
x (list[list]): Either a 2D list of rankings (as a column vector) or a 2D contingency table
y (list[list], optional): 2D list of rankings (as a column vector), same number of rows as x. Ignored if x is a contingency table Default is None.
somersd_alternative (str, optional): Defines the alternative hypothesis Valid options: Two-sided, Less, Greater. Default is 'two-sided'.
Returns:
list[list]: 2D list [[statistic, p_value]], or error message string.
"""
# Validate alternative parameter
if somersd_alternative not in ['two-sided', 'less', 'greater']:
return "Invalid input: somersd_alternative must be 'two-sided', 'less', or 'greater'."
# Validate x
if not isinstance(x, list) or len(x) < 2:
return "Invalid input: x must be a 2D list with at least two rows."
# Check if x is a contingency table (2D matrix) or a column vector
is_contingency = all(isinstance(row, list) and len(row) > 1 for row in x)
try:
if is_contingency:
# Validate all elements are numbers
for row in x:
for val in row:
float(val)
result = scipy_somersd(x, alternative=somersd_alternative)
else:
# x and y must be column vectors
x_vec = [float(row[0]) if isinstance(row, list) else float(row) for row in x]
if y is None:
return "Invalid input: y must be provided when x is a vector."
if not isinstance(y, list) or len(y) != len(x):
return "Invalid input: y must be a 2D list with the same number of rows as x."
y_vec = [float(row[0]) if isinstance(row, list) else float(row) for row in y]
result = scipy_somersd(x_vec, y_vec, alternative=somersd_alternative)
stat = result.statistic
pval = result.pvalue
# Validate results
if stat is None or pval is None:
return "Invalid result: statistic or pvalue is None."
if math.isnan(stat) or math.isinf(stat):
return "Invalid result: statistic is NaN or infinite."
if math.isnan(pval) or math.isinf(pval):
return "Invalid result: pvalue is NaN or infinite."
return [[stat, pval]]
except Exception as e:
return f"Error: {e}"